373 research outputs found

    Quality of Experience (QoE)-Aware Fast Coding Unit Size Selection for HEVC Intra-prediction

    Get PDF
    The exorbitant increase in the computational complexity of modern video coding standards, such as High Efficiency Video Coding (HEVC), is a compelling challenge for resource-constrained consumer electronic devices. For instance, the brute force evaluation of all possible combinations of available coding modes and quadtree-based coding structure in HEVC to determine the optimum set of coding parameters for a given content demand a substantial amount of computational and energy resources. Thus, the resource requirements for real time operation of HEVC has become a contributing factor towards the Quality of Experience (QoE) of the end users of emerging multimedia and future internet applications. In this context, this paper proposes a content-adaptive Coding Unit (CU) size selection algorithm for HEVC intra-prediction. The proposed algorithm builds content-specific weighted Support Vector Machine (SVM) models in real time during the encoding process, to provide an early estimate of CU size for a given content, avoiding the brute force evaluation of all possible coding mode combinations in HEVC. The experimental results demonstrate an average encoding time reduction of 52.38%, with an average Bjøntegaard Delta Bit Rate (BDBR) increase of 1.19% compared to the HM16.1 reference encoder. Furthermore, the perceptual visual quality assessments conducted through Video Quality Metric (VQM) show minimal visual quality impact on the reconstructed videos of the proposed algorithm compared to state-of-the-art approaches

    Virtual Frames as Long-Term Reference Frames for HEVC Inter-Prediction

    Get PDF
    High Efficiency Video Coding(HEVC) employs both past or future frames when encoding the current frame in a video sequence. This paper proposes a framework for using virtual reference frames, to achieve increased coding gains in the long-term for repetitive scenes in static camera scenarios

    A Decoding-Complexity and Rate-Controlled Video-Coding Algorithm for HEVC

    Get PDF
    Video playback on mobile consumer electronic (CE) devices is plagued by fluctuations in the network bandwidth and by limitations in processing and energy availability at the individual devices. Seen as a potential solution, the state-of-the-art adaptive streaming mechanisms address the first aspect, yet the efficient control of the decoding-complexity and the energy use when decoding the video remain unaddressed. The quality of experience (QoE) of the end-users’ experiences, however, depends on the capability to adapt the bit streams to both these constraints (i.e., network bandwidth and device’s energy availability). As a solution, this paper proposes an encoding framework that is capable of generating video bit streams with arbitrary bit rates and decoding-complexity levels using a decoding-complexity–rate–distortion model. The proposed algorithm allocates rate and decoding-complexity levels across frames and coding tree units (CTUs) and adaptively derives the CTU-level coding parameters to achieve their imposed targets with minimal distortion. The experimental results reveal that the proposed algorithm can achieve the target bit rate and the decoding-complexity with 0.4% and 1.78% average errors, respectively, for multiple bit rate and decoding-complexity levels. The proposed algorithm also demonstrates a stable frame-wise rate and decoding-complexity control capability when achieving a decoding-complexity reduction of 10.11 (%/dB). The resultant decoding-complexity reduction translates into an overall energy-consumption reduction of up to 10.52 (%/dB) for a 1 dB peak signal-to-noise ratio (PSNR) quality loss compared to the HM 16.0 encoded bit streams

    Wireless End-to-End Image Transmission System using Semantic Communications

    Full text link
    Semantic communication is considered the future of mobile communication, which aims to transmit data beyond Shannon's theorem of communications by transmitting the semantic meaning of the data rather than the bit-by-bit reconstruction of the data at the receiver's end. The semantic communication paradigm aims to bridge the gap of limited bandwidth problems in modern high-volume multimedia application content transmission. Integrating AI technologies with the 6G communications networks paved the way to develop semantic communication-based end-to-end communication systems. In this study, we have implemented a semantic communication-based end-to-end image transmission system, and we discuss potential design considerations in developing semantic communication systems in conjunction with physical channel characteristics. A Pre-trained GAN network is used at the receiver as the transmission task to reconstruct the realistic image based on the Semantic segmented image at the receiver input. The semantic segmentation task at the transmitter (encoder) and the GAN network at the receiver (decoder) is trained on a common knowledge base, the COCO-Stuff dataset. The research shows that the resource gain in the form of bandwidth saving is immense when transmitting the semantic segmentation map through the physical channel instead of the ground truth image in contrast to conventional communication systems. Furthermore, the research studies the effect of physical channel distortions and quantization noise on semantic communication-based multimedia content transmission.Comment: Accepted for IEEE Acces

    Video compression by chroma prediction using semantic communications

    Get PDF
    Conventional video coding is evolving to meet unprecedented consumer device requirements, but the statistical signal processing based approach may find limitations in handling new media contents. Deep neural network and semantic communication based video compression systems show potential to be used as video encoders and decoders, but reaching the rate distortion performance of state-of-the-art conventional video coding systems remains to be achieved. A novel video compression system by predicting the chroma components of video using the semantically encoded luma component and reference intra-coded frames is proposed and tested against high efficiency video coding (HEVC) for bit rate comparison and rate-distortion performance evaluation. The proposed system demonstrated 18% to 30% saving of bit rate for high and medium motion videos without significant reductions of rate-distortion with the saving increasing with higher group of picture sizes, but low motion videos only demonstrated negligible savings

    Static video compression's influence on neural network performance

    Get PDF
    The concept of action recognition in smart security heavily relies on deep learning and artificial intelligence to make predictions about actions of humans. To draw appropriate conclusions from these hypotheses, a large amount of information is required. The data in question are often a video feed, and there is a direct relationship between increased data volume and more-precise decision-making. We seek to determine how far a static video can be compressed before the neural network's capacity to predict the action in the video is lost. To find this, videos are compressed by lowering the bitrate using FFMPEG. In parallel, a convolutional neural network model is trained to recognise action in the videos and is tested on the compressed videos until the neural network fails to predict the action observed in the videos. The results reveal that bitrate compression has no linear relationship with neural network performance

    Deep learning and bidirectional optical flow based viewport predictions for 360° video coding

    Get PDF
    The rapid development of virtual reality applications continues to urge better compression of 360° videos owing to the large volume of content. These videos are typically converted to 2-D formats using various projection techniques in order to benefit from ad-hoc coding tools designed to support conventional 2-D video compression. Although recently emerged video coding standard, Versatile Video Coding (VVC) introduces 360° video specific coding tools, it fails to prioritize the user observed regions in 360° videos, represented by the rectilinear images called the viewports. This leads to the encoding of redundant regions in the video frames, escalating the bit rate cost of the videos. In response to this issue, this paper proposes a novel 360° video coding framework for VVC which exploits user observed viewport information to alleviate pixel redundancy in 360° videos. In this regard, bidirectional optical flow, Gaussian filter and Spherical Convolutional Neural Networks (Spherical CNN) are deployed to extract perceptual features and predict user observed viewports. By appropriately fusing the predicted viewports on the 2-D projected 360° video frames, a novel Regions of Interest (ROI) aware weightmap is developed which can be used to mask the source video and introduce adaptive changes to the Lagrange and quantization parameters in VVC. Comprehensive experiments conducted in the context of VVC Test Model (VTM) 7.0 show that the proposed framework can improve bitrate reduction, achieving an average bitrate saving of 5.85% and up to 17.15% at the same perceptual quality which is measured using Viewport Peak Signal-To-Noise Ratio (VPSNR)

    Region of interest scalable image compression using semantic communications

    Get PDF
    Growing consumer demand for media content over a wide range of devices has made scalable image compression vital in today’s media landscape. Image compression is conventionally achieved by means of statistical signal processing, but since recently, deep learning techniques are seen to be widely as well. Capabilities of such systems also enable accurate identification of regions of interest in images, leading optimised performance in most applications. This paper proposes a region-of-interest scalable image compression system using semantic communications, where an autoencoder-based semantic encoder performs the base level compression, while a Semantic Mask Extracting Transformer (SeMExT) enables identification of regions of interest to create enhancement layers with different quality levels using a scalable JPEG encoder. When benchmarked against scalable JPEG across a variety of images, the proposed system demonstrates significantly improved compressive performance. The base layer achieved 61.4 times more compression on average, along with better rate-distortion performance at any given quality level

    Spectral Ranking in Complex Networks Using Memristor Crossbars

    Get PDF
    Various centrality measures have been proposed to identify the influence of each node in a complex network. Among the most popular ranking metrics, spectral measures stand out from the crowd. They rely on the computation of the dominant eigenvector of suitable matrices related to the graph: EigenCentrality, PageRank, Hyperlink Induced Topic Search (HITS) and Stochastic Approach for Link-Structure Analysis (SALSA). The simplest algorithm used to solve this linear algebra computation is the Power Method. It consists of multiple Matrix-Vector Multiplications (MVMs) and a normalization step to avoid divergent behaviours. In this work, we present an analog circuit used to accelerate the Power Iteration algorithm including current-mode termination for the memristor crossbars and a normalization circuit. The normalization step together with the feedback loop of the complete circuit ensure stability and convergence of the dominant eigenvector. We implement a transistor level peripheral circuitry around the memristor crossbar and take non-idealities such as wire parasitics, source driver resistance and finite memristor precision into account. We compute the different spectral centralities to demonstrate the performance of the system. We compare our results to the ones coming from the conventional digital computers and observe significant energy savings while maintaining a competitive accuracy
    • …
    corecore